{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "2d5f1205",
   "metadata": {},
   "source": [
    "# SQL Translate\n",
    "\n",
    "This example will show you how to translate natural language to SQL queries, the original example is on [OpenAI Example](https://platform.openai.com/examples/default-sql-translate), the difference is that we will teach you how to cache the response for exact and similar matches with **gptcache**, it will be very simple, you just need to add an extra step to initialize the cache.\n",
    "\n",
    "Before running the example, make sure the `OPENAI_API_KEY` environment variable is set by executing `echo $OPENAI_API_KEY`. If it is not already set, it can be set by using `export OPENAI_API_KEY=YOUR_API_KEY` on Unix/Linux/MacOS systems or `set OPENAI_API_KEY=YOUR_API_KEY` on Windows systems.\n",
    "\n",
    "Then we can learn the usage and acceleration effect of gptcache by the following code, which consists of three parts, the original openai way, the exact search and the similar search."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4dd7eb8a",
   "metadata": {},
   "source": [
    "## OpenAI API original usage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "6fe19528",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "SELECT  d.name \n",
      "FROM Department d \n",
      "INNER JOIN Employee e ON d.id = e.department_id \n",
      "INNER JOIN Salary_Payments sp ON e.id = sp.employee_id \n",
      "WHERE sp.date > NOW() - INTERVAL '3 months' \n",
      "GROUP BY d.name \n",
      "HAVING COUNT(*) > 10\n",
      "Time consuming: 2.78s\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "import openai\n",
    "\n",
    "question = \"A query to list the names of the departments which employed more than 10 employees in the last 3 months\\nSELECT\"\n",
    "def response_text(openai_resp):\n",
    "    return openai_resp[\"choices\"][0][\"text\"]\n",
    "\n",
    "start_time = time.time()\n",
    "response = openai.Completion.create(\n",
    "  model=\"text-davinci-003\",\n",
    "  prompt=\"### Postgres SQL tables, with their properties:\\n#\\n# Employee(id, name, department_id)\\n# Department(id, name, address)\\n# Salary_Payments(id, employee_id, amount, date)\\n#\\n### \" + question,\n",
    "  temperature=0,\n",
    "  max_tokens=150,\n",
    "  top_p=1.0,\n",
    "  frequency_penalty=0.0,\n",
    "  presence_penalty=0.0,\n",
    "  stop=[\"#\", \";\"]\n",
    ")\n",
    "print(\"\\nSELECT\", response_text(response))\n",
    "print(\"Time consuming: {:.2f}s\".format(time.time() - start_time))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37112300",
   "metadata": {},
   "source": [
    "## OpenAI API + GPTCache, exact match cache\n",
    "\n",
    "Initalize the cache to run GPTCache and import `openai` form `gptcache.adapter`, which will automatically set the map data manager to match the exact cahe, more details refer to [build your cache](https://gptcache.readthedocs.io/en/dev/usage.html#build-your-cache)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "7b5a627c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cache loading.....\n",
      "\n",
      "SELECT  d.name \n",
      "FROM Department d \n",
      "INNER JOIN Employee e ON d.id = e.department_id \n",
      "INNER JOIN Salary_Payments sp ON e.id = sp.employee_id \n",
      "WHERE sp.date > NOW() - INTERVAL '3 months' \n",
      "GROUP BY d.name \n",
      "HAVING COUNT(*) > 10\n",
      "Time consuming: 1.88s\n",
      "\n",
      "SELECT  d.name\n",
      "FROM Department d\n",
      "INNER JOIN Employee e ON d.id = e.department_id\n",
      "INNER JOIN Salary_Payments sp ON e.id = sp.employee_id\n",
      "WHERE sp.date > NOW() - INTERVAL '3 months'\n",
      "GROUP BY d.name\n",
      "HAVING COUNT(*) > 10\n",
      "Time consuming: 1.46s\n",
      "\n",
      "SELECT  d.name\n",
      "FROM Department d\n",
      "INNER JOIN Employee e ON d.id = e.department_id\n",
      "INNER JOIN Salary_Payments sp ON e.id = sp.employee_id\n",
      "WHERE sp.date > DATE_SUB(CURDATE(), INTERVAL 3 MONTH)\n",
      "GROUP BY d.name\n",
      "HAVING COUNT(*) > 10\n",
      "Time consuming: 2.08s\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "def response_text(openai_resp):\n",
    "    return openai_resp[\"choices\"][0][\"text\"]\n",
    "\n",
    "print(\"Cache loading.....\")\n",
    "\n",
    "# To use GPTCache, that's all you need\n",
    "# -------------------------------------------------\n",
    "from gptcache import cache\n",
    "from gptcache.processor.pre import get_prompt\n",
    "\n",
    "cache.init(pre_embedding_func=get_prompt)\n",
    "cache.set_openai_key()\n",
    "# -------------------------------------------------\n",
    "\n",
    "questions = [\n",
    "    \"A query to list the names of the departments which employed more than 10 employees in the last 3 months\\nSELECT\",\n",
    "    \"Query the names of the departments which employed more than 10 employees in the last 3 months\\nSELECT\",\n",
    "    \"List the names of the departments which employed more than 10 employees in the last 3 months\\nSELECT\",\n",
    "]\n",
    "\n",
    "\n",
    "for question in questions:\n",
    "    start_time = time.time()\n",
    "    response = openai.Completion.create(\n",
    "      model=\"text-davinci-003\",\n",
    "      prompt=\"### Postgres SQL tables, with their properties:\\n#\\n# Employee(id, name, department_id)\\n# Department(id, name, address)\\n# Salary_Payments(id, employee_id, amount, date)\\n#\\n### \" + question,\n",
    "      temperature=0,\n",
    "      max_tokens=150,\n",
    "      top_p=1.0,\n",
    "      frequency_penalty=0.0,\n",
    "      presence_penalty=0.0,\n",
    "      stop=[\"#\", \";\"]\n",
    "    )\n",
    "    print(\"\\nSELECT\", response_text(response))\n",
    "    print(\"Time consuming: {:.2f}s\".format(time.time() - start_time))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b57d718a",
   "metadata": {},
   "source": [
    "## OpenAI API + GPTCache, similar search cache\n",
    "\n",
    "Set the cache with `pre_embedding_func` to preprocess the input data, `embedding_func` to generate embedding for the text, and `data_manager` to manager the cache data, `similarity_evaluation` to evaluate the similarities, more details refer to [build your cache](https://gptcache.readthedocs.io/en/dev/usage.html#build-your-cache)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "4018bb39",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cache loading.....\n",
      "A query to list the names of the departments which employed more than 10 employees in the last 3 months\n",
      "SELECT  d.name \n",
      "FROM Department d \n",
      "INNER JOIN Employee e ON d.id = e.department_id \n",
      "INNER JOIN Salary_Payments sp ON e.id = sp.employee_id \n",
      "WHERE sp.date > NOW() - INTERVAL '3 months' \n",
      "GROUP BY d.name \n",
      "HAVING COUNT(*) > 10\n",
      "Time consuming: 1.89s\n",
      "Query the names of the departments which employed more than 10 employees in the last 3 months\n",
      "SELECT  d.name \n",
      "FROM Department d \n",
      "INNER JOIN Employee e ON d.id = e.department_id \n",
      "INNER JOIN Salary_Payments sp ON e.id = sp.employee_id \n",
      "WHERE sp.date > NOW() - INTERVAL '3 months' \n",
      "GROUP BY d.name \n",
      "HAVING COUNT(*) > 10\n",
      "Time consuming: 0.25s\n",
      "List the names of the departments which employed more than 10 employees in the last 3 months\n",
      "SELECT  d.name \n",
      "FROM Department d \n",
      "INNER JOIN Employee e ON d.id = e.department_id \n",
      "INNER JOIN Salary_Payments sp ON e.id = sp.employee_id \n",
      "WHERE sp.date > NOW() - INTERVAL '3 months' \n",
      "GROUP BY d.name \n",
      "HAVING COUNT(*) > 10\n",
      "Time consuming: 0.22s\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "\n",
    "def response_text(openai_resp):\n",
    "    return openai_resp[\"choices\"][0][\"text\"]\n",
    "\n",
    "from gptcache import cache\n",
    "from gptcache.adapter import openai\n",
    "from gptcache.embedding import Onnx\n",
    "from gptcache.processor.pre import get_prompt\n",
    "from gptcache.manager import CacheBase, VectorBase, get_data_manager\n",
    "from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation\n",
    "\n",
    "print(\"Cache loading.....\")\n",
    "\n",
    "onnx = Onnx()\n",
    "data_manager = get_data_manager(CacheBase(\"sqlite\"), VectorBase(\"faiss\", dimension=onnx.dimension))\n",
    "cache.init(pre_embedding_func=get_prompt,\n",
    "    embedding_func=onnx.to_embeddings,\n",
    "    data_manager=data_manager,\n",
    "    similarity_evaluation=SearchDistanceEvaluation(),\n",
    "    )\n",
    "cache.set_openai_key()\n",
    "\n",
    "questions = [\n",
    "    \"A query to list the names of the departments which employed more than 10 employees in the last 3 months\\nSELECT\",\n",
    "    \"Query the names of the departments which employed more than 10 employees in the last 3 months\\nSELECT\",\n",
    "    \"List the names of the departments which employed more than 10 employees in the last 3 months\\nSELECT\",\n",
    "]\n",
    "\n",
    "\n",
    "for question in questions:\n",
    "    start_time = time.time()\n",
    "    response = openai.Completion.create(\n",
    "      model=\"text-davinci-003\",\n",
    "      prompt=\"### Postgres SQL tables, with their properties:\\n#\\n# Employee(id, name, department_id)\\n# Department(id, name, address)\\n# Salary_Payments(id, employee_id, amount, date)\\n#\\n### \" + question,\n",
    "      temperature=0,\n",
    "      max_tokens=150,\n",
    "      top_p=1.0,\n",
    "      frequency_penalty=0.0,\n",
    "      presence_penalty=0.0,\n",
    "      stop=[\"#\", \";\"]\n",
    "    )\n",
    "    print(question, response_text(response))\n",
    "    print(\"Time consuming: {:.2f}s\".format(time.time() - start_time))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8666399e",
   "metadata": {},
   "source": [
    "We find that the performance improvement when searching the similar because the three statements of the query are similar, and hitting cache in gptcache, so it will return the cached results directly instead of requesting. And you can then also try running the query again for exact search, which will also speed up."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6a69fef",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
